94 research outputs found

    Semi-automated Ontology Generation for Biocuration and Semantic Search

    Get PDF
    Background: In the life sciences, the amount of literature and experimental data grows at a tremendous rate. In order to effectively access and integrate these data, biomedical ontologies – controlled, hierarchical vocabularies – are being developed. Creating and maintaining such ontologies is a difficult, labour-intensive, manual process. Many computational methods which can support ontology construction have been proposed in the past. However, good, validated systems are largely missing. Motivation: The biocuration community plays a central role in the development of ontologies. Any method that can support their efforts has the potential to have a huge impact in the life sciences. Recently, a number of semantic search engines were created that make use of biomedical ontologies for document retrieval. To transfer the technology to other knowledge domains, suitable ontologies need to be created. One area where ontologies may prove particularly useful is the search for alternative methods to animal testing, an area where comprehensive search is of special interest to determine the availability or unavailability of alternative methods. Results: The Dresden Ontology Generator for Directed Acyclic Graphs (DOG4DAG) developed in this thesis is a system which supports the creation and extension of ontologies by semi-automatically generating terms, definitions, and parent-child relations from text in PubMed, the web, and PDF repositories. The system is seamlessly integrated into OBO-Edit and Protégé, two widely used ontology editors in the life sciences. DOG4DAG generates terms by identifying statistically significant noun-phrases in text. For definitions and parent-child relations it employs pattern-based web searches. Each generation step has been systematically evaluated using manually validated benchmarks. The term generation leads to high quality terms also found in manually created ontologies. Definitions can be retrieved for up to 78% of terms, child ancestor relations for up to 54%. No other validated system exists that achieves comparable results. To improve the search for information on alternative methods to animal testing an ontology has been developed that contains 17,151 terms of which 10% were newly created and 90% were re-used from existing resources. This ontology is the core of Go3R, the first semantic search engine in this field. When a user performs a search query with Go3R, the search engine expands this request using the structure and terminology of the ontology. The machine classification employed in Go3R is capable of distinguishing documents related to alternative methods from those which are not with an F-measure of 90% on a manual benchmark. Approximately 200,000 of the 19 million documents listed in PubMed were identified as relevant, either because a specific term was contained or due to the automatic classification. The Go3R search engine is available on-line under www.Go3R.org

    Semi-automated Ontology Generation for Biocuration and Semantic Search

    Get PDF
    Background: In the life sciences, the amount of literature and experimental data grows at a tremendous rate. In order to effectively access and integrate these data, biomedical ontologies – controlled, hierarchical vocabularies – are being developed. Creating and maintaining such ontologies is a difficult, labour-intensive, manual process. Many computational methods which can support ontology construction have been proposed in the past. However, good, validated systems are largely missing. Motivation: The biocuration community plays a central role in the development of ontologies. Any method that can support their efforts has the potential to have a huge impact in the life sciences. Recently, a number of semantic search engines were created that make use of biomedical ontologies for document retrieval. To transfer the technology to other knowledge domains, suitable ontologies need to be created. One area where ontologies may prove particularly useful is the search for alternative methods to animal testing, an area where comprehensive search is of special interest to determine the availability or unavailability of alternative methods. Results: The Dresden Ontology Generator for Directed Acyclic Graphs (DOG4DAG) developed in this thesis is a system which supports the creation and extension of ontologies by semi-automatically generating terms, definitions, and parent-child relations from text in PubMed, the web, and PDF repositories. The system is seamlessly integrated into OBO-Edit and Protégé, two widely used ontology editors in the life sciences. DOG4DAG generates terms by identifying statistically significant noun-phrases in text. For definitions and parent-child relations it employs pattern-based web searches. Each generation step has been systematically evaluated using manually validated benchmarks. The term generation leads to high quality terms also found in manually created ontologies. Definitions can be retrieved for up to 78% of terms, child ancestor relations for up to 54%. No other validated system exists that achieves comparable results. To improve the search for information on alternative methods to animal testing an ontology has been developed that contains 17,151 terms of which 10% were newly created and 90% were re-used from existing resources. This ontology is the core of Go3R, the first semantic search engine in this field. When a user performs a search query with Go3R, the search engine expands this request using the structure and terminology of the ontology. The machine classification employed in Go3R is capable of distinguishing documents related to alternative methods from those which are not with an F-measure of 90% on a manual benchmark. Approximately 200,000 of the 19 million documents listed in PubMed were identified as relevant, either because a specific term was contained or due to the automatic classification. The Go3R search engine is available on-line under www.Go3R.org

    Dynamics of wind turbine operational states

    Full text link
    Modern wind turbines gather an abundance of data with their Supervisory Control And Data Acquisition (SCADA) system. We study the short-term mutual dependencies of a variety of observables (e.g. wind speed, generated power and current, rotation frequency) by evaluating Pearson correlation matrices on a moving time window. The analysis of short-term correlations is made possible by high frequency SCADA-data. The resulting time series of correlation matrices exhibits non-stationarity in the mutual dependencies of different measurements at a single turbine. Using cluster analysis on these matrices, multiple stable operational states are found. They show distinct correlation structures, which represent different turbine control settings. The current system state is linked to external factors interacting with the control system of the wind turbine. For example at sufficiently high wind speeds, the state represents the behavior for rated power production. Moreover, we combine the clustering with stochastic process analysis to study the dynamics of those states in more detail. Calculating the distances between correlation matrices we obtain a time series that describes the behavior of the complex system in a collective way. Assuming this time series to be a stochastic process governed by a Langevin equation, we estimate the drift and diffusion terms to understand the underlying dynamics. The drift term, which describes the deterministic behavior of the system, is used to obtain a potential. Dips in the potential are identified with the cluster states. We study the dynamics of operational states and their transitions by analyzing the development of the potential over time and wind speed. Thereby, we further characterize the different states and discuss consequences for the analysis of high frequency wind turbine data

    The Neural Correlates of Face-Voice-Integration in Social Anxiety Disorder

    Get PDF
    Faces and voices are very important sources of threat in social anxiety disorder (SAD), a common psychiatric disorder where core elements are fears of social exclusion and negative evaluation. Previous research in social anxiety evidenced increased cerebral responses to negative facial or vocal expressions and also generally increased hemodynamic responses to voices and faces. But it is unclear if also the cerebral process of face-voice-integration is altered in SAD. Applying functional magnetic resonance imaging, we investigated the correlates of the audiovisual integration of dynamic faces and voices in SAD as compared to healthy individuals. In the bilateral midsections of the superior temporal sulcus (STS) increased integration effects in SAD were observed driven by greater activation increases during audiovisual stimulation as compared to auditory stimulation. This effect was accompanied by increased functional connectivity with the visual association cortex and a more anterior position of the individual integration maxima along the STS in SAD. These findings demonstrate that the audiovisual integration of facial and vocal cues in SAD is not only systematically altered with regard to intensity and connectivity but also the individual location of the integration areas within the STS. These combined findings offer a novel perspective on the neuronal representation of social signal processing in individuals suffering from SAD

    Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Ontology term labels can be ambiguous and have multiple senses. While this is no problem for human annotators, it is a challenge to automated methods, which identify ontology terms in text. Classical approaches to word sense disambiguation use co-occurring words or terms. However, most treat ontologies as simple terminologies, without making use of the ontology structure or the semantic similarity between terms. Another useful source of information for disambiguation are metadata. Here, we systematically compare three approaches to word sense disambiguation, which use ontologies and metadata, respectively.</p> <p>Results</p> <p>The 'Closest Sense' method assumes that the ontology defines multiple senses of the term. It computes the shortest path of co-occurring terms in the document to one of these senses. The 'Term Cooc' method defines a log-odds ratio for co-occurring terms including co-occurrences inferred from the ontology structure. The 'MetaData' approach trains a classifier on metadata. It does not require any ontology, but requires training data, which the other methods do not. To evaluate these approaches we defined a manually curated training corpus of 2600 documents for seven ambiguous terms from the Gene Ontology and MeSH. All approaches over all conditions achieve 80% success rate on average. The 'MetaData' approach performed best with 96%, when trained on high-quality data. Its performance deteriorates as quality of the training data decreases. The 'Term Cooc' approach performs better on Gene Ontology (92% success) than on MeSH (73% success) as MeSH is not a strict is-a/part-of, but rather a loose is-related-to hierarchy. The 'Closest Sense' approach achieves on average 80% success rate.</p> <p>Conclusion</p> <p>Metadata is valuable for disambiguation, but requires high quality training data. Closest Sense requires no training, but a large, consistently modelled ontology, which are two opposing conditions. Term Cooc achieves greater 90% success given a consistently modelled ontology. Overall, the results show that well structured ontologies can play a very important role to improve disambiguation.</p> <p>Availability</p> <p>The three benchmark datasets created for the purpose of disambiguation are available in Additional file <supplr sid="S1">1</supplr>.</p> <suppl id="S1"> <title> <p>Additional file 1</p> </title> <text> <p><b>Benchmark datasets used in the experiments.</b> The three corpora (High quality/Low quantity corpus; Medium quality/Medium quantity corpus; Low quality/High quantity corpus) are given in the form of PubMed identifiers (PMID) for True/False cases for the 7 ambiguous terms examined (GO/MeSH/UMLS identifiers are also given).</p> </text> <file name="1471-2105-10-28-S1.txt"> <p>Click here for file</p> </file> </suppl

    Partizipative Gestaltung eines gebrauchstauglichen mobilen Assistenzsystems für Instandhalter

    Get PDF
    Mit dem Einsatz mobiler Produktionsassistenzsysteme entstehen neue Anforderungen an die Gestaltung von Mensch-Maschine-Schnittstellen (MMS). Solche MMS umfassen eine grafische Benutzerschnittstelle über die Softwareoberfläche (GUI) sowie eine tangible Mensch-Maschine-Schnittstelle (tMMS) über hardwaretechnische Funktions- und Bedienelemente. Eine gebrauchstaugliche Gestaltung dieser MMS liefert ein großes Potenzial zur sicheren Bedienung und steigert deren Akzeptanz durch die Anwender. Aufbauend auf den Methoden des Usability Engineering wird die nutzerzentrierte Entwicklung einer gebrauchstauglichen MMS für das Ressourcencockpit Phasen dargestellt. Grundlage hierfür bietet ein Anforderungskatalog, der die Bedarfe von Instandhaltern, Service-Technikern sowie Planungs- und Instandhaltungsleitern zusammenfasst. Auch bei der iterativen Entwicklung, Prototypengestaltung und Evaluation wird eine partizipative Vorgehensweise gemeinsam mit den Anwendern gewählt. Im Ergebnis liegen für Teilaspekte der Gestaltung und den zusammengesetzten Geometrieprototypen bereits hohe Bewertungen der Gebrauchstauglichkeit vor

    Extending ontologies by finding siblings using set expansion techniques

    Get PDF
    Motivation: Ontologies are an everyday tool in biomedicine to capture and represent knowledge. However, many ontologies lack a high degree of coverage in their domain and need to improve their overall quality and maturity. Automatically extending sets of existing terms will enable ontology engineers to systematically improve text-based ontologies level by level

    A compact ion-trap quantum computing demonstrator

    Full text link
    Quantum information processing is steadily progressing from a purely academic discipline towards applications throughout science and industry. Transitioning from lab-based, proof-of-concept experiments to robust, integrated realizations of quantum information processing hardware is an important step in this process. However, the nature of traditional laboratory setups does not offer itself readily to scaling up system sizes or allow for applications outside of laboratory-grade environments. This transition requires overcoming challenges in engineering and integration without sacrificing the state-of-the-art performance of laboratory implementations. Here, we present a 19-inch rack quantum computing demonstrator based on 40Ca+^{40}\textrm{Ca}^+ optical qubits in a linear Paul trap to address many of these challenges. We outline the mechanical, optical, and electrical subsystems. Further, we describe the automation and remote access components of the quantum computing stack. We conclude by describing characterization measurements relevant to digital quantum computing including entangling operations mediated by the Molmer-Sorenson interaction. Using this setup we produce maximally-entangled Greenberger-Horne-Zeilinger states with up to 24 ions without the use of post-selection or error mitigation techniques; on par with well-established conventional laboratory setups

    Engaging with terminology in the multilingual classroom:Teachers’ practices for bridging the gap between L1 lectures and English reading

    Get PDF
    In some academic settings where English is not the first language it is nonetheless common for reading to be assigned in English, and the expectation is often that students will acquire subject terminology incidentally in the first language as well as in English as a result of listening and reading. It is then a prerequisite that students notice and engage with terminology in both languages. To this end, teachers’ classroom practices for making students attend to and engage with terms are crucial for furthering students’ vocabulary competence in two languages. Using transcribed video recordings of eight undergraduate lectures from two universities in such a setting, this paper provides a comprehensive picture of what teachers ‘do’ with terminology during a lecture, i.e., how terms are allowed to feature in the classroom discourse. It is established, for example, that teachers nearly always employ some sort of emphatic practice when using a term in a lecture. However, the repertoire of such practices is limited. Further, teachers rarely adapt their repertoires to cater to the special needs arguably required in these settings, or to exploit the affordances of multilingual environments
    corecore